source("wildfires_wrangling.R")The Evolution of Wildfires: California
Wildfires Throughout the Years (1992-2020)
In the last several decades there has been a noticeable trend regarding the increase in fires in the United States. Not just in the States, but in the world as a whole. People are concerned, and for good reason– when the land is dry they can spread quickly and be hard to contain.
California especially has been a hot topic regarding its fire situation. California has been known for years to have many fires, but what has been changing to make this situation more dangerous? What trends can be found from looking at this data? And are fires really becoming more dangerous or are we only just now paying more attention?
Accessing the Data
The data I am using for this project is from the US Department of Agriculture (more specifically, Karen C. Short) and can be found here: https://www.fs.usda.gov/rds/archive/catalog/RDS-2013-0009.6
Preparing the Data
The dataset I am working with is quite large at 2,300,000+ occurrences. As such, I found the need to break the data up into smaller subsets so that it is easier to work with. During this process I created a few new variables: discovery, containment, fire_duration, and fire_severity.
discovery - combines the date and time of day for when a fire was discovered.
containment - combines the date and time of day for when a fire was contained.
fire_duration - calculates how long a fire lasts from discovery to containment in minutes.
fire_severity - calculated by multiplying the size of the fire by its duration.
Finally, Working With the Data
# creating a basic bar plot to view the number of fires (by size) over the years
ggplot(fod_CA, aes(x = fire_year, fill = fire_size_class)) +
geom_bar() +
labs(
title = "30 Years of Californian Wildfires",
x = "Year",
y = "Count",
fill = "Fire Size Class",
caption = "Data provided by USDA") +
scale_fill_brewer(palette="YlOrRd")
Given the above bar chart it could be easy to dismiss the big question. While 2020 is the most recent year it hasn’t been the year with the most fires. No, that honor falls to 2007. However, if we look closer at fires of size G (which are 5000+ acres in size) there may be a more interesting trend.
# filtering the data
fod_CA_G <- fod_CA |> filter(fire_size_class == "G")
# making essentially the same plot, but focusing on the largest fire sizes
ggplot(fod_CA_G, aes(x = fire_year, fill = fire_size_class)) +
geom_bar() +
labs(
title = "California's Largest Fires",
x = "Year",
y = "Count",
fill = "Fire Size Class",
caption = "Data provided by USDA")
Now let’s bring this back and look at not just the size classification, but the actual size of the fires throughout the years.
ggplot(fod_CA, aes(x = fire_year, y = fire_size, color = fire_size_class)) +
geom_point() +
labs(
title = "California's Largest Fires: Take 2",
x = "Year",
y = "Size",
color = "Fire Size Class",
caption = "Data provided by USDA")+
scale_color_brewer(palette="YlOrRd")
So How do Severity and Duration factor in?
Using the fire_duration and fire_severity variables I created earlier I calculated the averages across a handful of years.
year <- c(1992,1995,2000,2005,2010,2015,2016,2017,2018,2019,2020)
size <- c(
mean(CA_1992$fire_size, na.rm=TRUE),
mean(CA_1995$fire_size, na.rm=TRUE),
mean(CA_2000$fire_size, na.rm=TRUE),
mean(CA_2005$fire_size, na.rm=TRUE),
mean(CA_2010$fire_size, na.rm=TRUE),
mean(CA_2015$fire_size, na.rm=TRUE),
mean(CA_2016$fire_size, na.rm=TRUE),
mean(CA_2017$fire_size, na.rm=TRUE),
mean(CA_2018$fire_size, na.rm=TRUE),
mean(CA_2019$fire_size, na.rm=TRUE),
mean(CA_2020$fire_size, na.rm=TRUE))
duration <- c(
mean(CA_1992$fire_duration, na.rm=TRUE),
mean(CA_1995$fire_duration, na.rm=TRUE),
mean(CA_2000$fire_duration, na.rm=TRUE),
mean(CA_2005$fire_duration, na.rm=TRUE),
mean(CA_2010$fire_duration, na.rm=TRUE),
mean(CA_2015$fire_duration, na.rm=TRUE),
mean(CA_2016$fire_duration, na.rm=TRUE),
mean(CA_2017$fire_duration, na.rm=TRUE),
mean(CA_2018$fire_duration, na.rm=TRUE),
mean(CA_2019$fire_duration, na.rm=TRUE),
mean(CA_2020$fire_duration, na.rm=TRUE))
severity <- c(
mean(CA_1992$fire_severity, na.rm=TRUE),
mean(CA_1995$fire_severity, na.rm=TRUE),
mean(CA_2000$fire_severity, na.rm=TRUE),
mean(CA_2005$fire_severity, na.rm=TRUE),
mean(CA_2010$fire_severity, na.rm=TRUE),
mean(CA_2015$fire_severity, na.rm=TRUE),
mean(CA_2016$fire_severity, na.rm=TRUE),
mean(CA_2017$fire_severity, na.rm=TRUE),
mean(CA_2018$fire_severity, na.rm=TRUE),
mean(CA_2019$fire_severity, na.rm=TRUE),
mean(CA_2020$fire_severity, na.rm=TRUE))
df <- data.frame(year, size, duration, severity)
df |> arrange(desc(severity)) year size duration severity
1 2020 417.08324 2581.1955 36539834.5
2 2018 172.40278 1009.9193 19888390.8
3 2015 115.35523 1828.7477 7998538.3
4 2016 72.93364 708.2207 6815913.4
5 2019 45.64936 2585.4358 4540254.9
6 2017 142.61927 1267.2733 4231720.2
7 2000 37.97286 526.9906 1427894.1
8 2010 15.40284 1861.9513 1140790.2
9 1992 27.36955 693.9354 606623.1
10 1995 29.27364 653.2748 582316.9
11 2005 25.83049 591.6193 451918.4
Here we can see that 2020, the most recent year, is coming out on top with the most severe fires on average. Not only that, but all of the more recent years are showing to have more severe fires.
# displaying the above df as a connected scatter plot
ggplot(df, aes(x = year, y = severity)) +
geom_line(color = "grey") +
# adding red circles at the datapoints
geom_point(shape = 21, color = "black", fill = "red", size = 6) +
labs(
title = "Evolution of Fire Severity",
x = "Year",
y = "Severity",
caption = "Data provided by USDA")
Gaining Perspective
# selecting which feature IDs to show on the map
IDs <- c("oid","latitude","longitude","state","fips_name",
"owner_descr","fire_code", "fire_name","discovery",
"containment","fire_duration","fire_severity","fire_size",
"fire_size_class","nwcg_cause_classification",
"nwcg_general_cause","nwcg_cause_age_category")
# creating an interactive map that will plot CA fires from 2020 via latitude and longitude
# each point on the map is clickable and will pull up features associated with that point
mapviewOptions(basemaps = "OpenStreetMap.DE") #<- limits which maps can be chosen
map_2020 <- CA_2020 |> mapview(
xcol = "longitude",
ycol = "latitude",
zcol = "fire_size_class", # coloring by fire_class
col.regions = brewer.pal(7, "YlOrRd"),
cex = "fire_size", # points on the map will vary by fire size (in acres)
crs = "NAD83", # coordinate system
grid = FALSE,
popup = popupTable(CA_2020, zcol = IDs),
layer.name = "Fire Size Class",
legend = TRUE)
map_2020